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(57) Abstract: The present invention introduces data path architectures of processing devices with a distributed register file. These 
data path architectures are obtained by applying a set of building rules to a set of building blocks. The number of register files 
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files to the processing elements via distributed crossbars are given. Arrays of processing devices are considered as well. Data path 
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A data processing device with distributed register file 



1. Field of the invention 

The present invention relates to the field of architecture design of data processing devices. More 
specifically, the invention is dealing with architecture design issues at register-transfer level and is 
focusing on data path archftedures of processing devices. 



2. Conventions^ definition of tenns, temiinology 

First. It should be noted that in the literature the two expressions 'distributed register file' and 'distributed 
register files' (files with an 's*) are used synonymously and stand for two or more register files. 

The term 'data processing device' has a veiy broad meaning and can stand for terms like 
(micro)processor, micro-controller, central processing unit (CPU), digital signal processor (DSP), 
application specnflc integrated circuit (^SIC), application specific standard product (ASSP), application 
specific Instruction set processor (ASIP). As mentioned before, the present invention is dealing with 
architecture design issues at register-transfer-level. A register-transfer level architecture of a processing 
device can be thought of as consisting of a limited number of elementary building blocks with which the 
processing device is built up. The register transfer-level architecture of a processing device typically 
consists of Processing Elements (PEs), register files, busses, crossbars and a control unit which are 
arranged and connected to each other in a well defined manner. The way how these building blocks are 
an-anged and connected together determines the features of an architecture of such a processing 
device. The term 'PE' is frequently used In the same sense as Wata proceslng device'. However in the 
text that follows, the term 'PE' has a more restricted meaning and will represent, unless specified 
othen(vise, either Arithmetic Logic Units (ALUs), floating point units (FPUs) or other functional units 
(FUs) of a processing device. A crossbar is a building block that makes connections between Its Inputs 
and outputs. A fully connected crossbar is able to connect any Input to one, more or even all outputs. A 
partially connected crossbar is able to connect any input to one or more but not all outputs. 
Multiplexers/demultiplexers are crossbars with one Input/output and one or more outputs/inputs 
respectively. The meaning of the other before mentioned building blocks is identical to the one normally 
described in the literature. 

The (reglster-ti^nsfer-leveO data path architecture of a processing device comprises only building bfocte 
direcHy involved in the data processing, e.g. PEs, register files, busses and crossbars, but not any 
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control units used to control the building blocks of the data path. Therefore in all the following figures, 
control signals for crossbars, PEs and register files will only be shown when they are relevant in the 
context of the present invention. FurthennoFe, in all the figures that follow, arrows represent either 
bussed connections between building blocks or bussed inputs and bussed outputs of building blocks 
and processing devices, where the bus width of a bussed connection or of a bussed Input/output is 
equal to one or more bits. Unless specified olhenf^se, all the inputs and outputs of building blocks and 
of a processing device itself refer to data and not to control signals. It is assumed that all the control 
s^nals for all the building blocks are generated from one or more control units of the processing device, 
these control units typically comprising instniction decode and es^cution units as well as memory 
management units. Control s^nals for register files f.ex. determine the addresses of the register 
locations to/from which data are written/read respectively or represent clocking signals. Register file 
inputs are also called write ports and register file outputs are also called read ports. ReadMrite ports 
may have simultaneous access to all register locations in the register file. Control signals for PEs f.ex. 
select the operations to be performed. Control signals for crossbars determine the connections to be 
made between crossbar inputs and crossbar outputs. 



3. Prior Art 

Before investigating the prior art in processor architectures vAOi a distributed register file, it is worttiwhile 
to have in mind the data path architecture of a 'conventional' processor with a single register file as It is 
used in today's microprocessors and as shown in figure 1 . It is characterized by the fact that all the PE 
outputs are connected to the same one register file and that all the PEs may have simultaneous access 
(for reading and writing data) to any register location in the regteter fle. 

F^ures 2, 3 and 4 can be used to retrace briefly the evolutionary steps of register-transfer-level 
architectures of processing devices with a distributed register file. 

Figure 2 shows one of the first data path architectures of a processing device with a distributed register 
file. It was called the polycyclic processor and was developed by ESL inc. in the early 80's. The data 
path architecture at register-transfer-level is shown and consists of a set of PEs whose inputs and 
outputs are connected to a crossbar witti delay elements at each cross point The delay elemente can 
be thought of as a particular implementation of a register file. PEs, crossbar and delay elements are 
connected in the following way : (1) each PE (data) output is connected to as many independent cross 
points in a row of the crossbar as there are PE inputs (2) each PE input is connected to and selected 
out of as many cross points in a column as there are PE outputs. For the configuration as shown in 
figure 2 two PEs ha\nng each 2 inputs and 1 output this implies a fully connected crossbar with 
4X2 independent cross points, in other words 2 rows with each 4 cross points or equivaientiy 4 columns 
with each 2 cross points, and with as many delay elements as cross points. Note that this Is drawn 
symtioKcaliy in figure 2. The detailed architecture of the crossbar with Vtie delay elements is not shown. 
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It is important to note that ESL Inc. had microprocessors In mind when speaking of PEs. Therefore, the 
crossbar with delay elements is first of all an efOoient method of exchanging data between 
microprocessors, hence an efRdent method to build multi-processor systems. A next step in the 
evolution of data path architectures with a distnTjuted register file consisted In integrating the crossbar 
with delay elements as shown in figure 2 directly into the data path architecture of a microprocessor. 
This step was done in the data path architecture (again at register-transfer-level) of a video signal 
processor which was developed in the late 80*s by Philips Research and which is shown in figure 3. 

Although the terminology used in figure 3 Is slightly different firom that used in figure 2, the building 
bloclcs in question and the way in which they are connected together are identicai : in fgure 3 the 
crossbar is called a switch matrix, the delay elements are called Silos, the PEs are called ALEs 
(Arithmetic Logic Elements), where ALE is yet another word for ALU. In figure 3, the Silos are used for 
slightly different data storage purposes : 1) as Memory Elements (MEs) which contain in addition to the 
Silos conventional memory for program data and logic for address calculation 2) as Suffer glemente 
(BEs) for buffering data 3) as Output Elements (OEs) for buffering data before they leave the processor. 
As mentioned above, the way in which these building blocte are connected together is the same as in 
figure 2, with the only difference lying in a more explicit separation and drawing of crossbar (s\AAch 
matrr)0 and delay elements (Silos). 

Finally, replacing delay elements (Silos) with conventional register files leads to a data path architecture 
with distributed register file as shown in figure 4. In figure 4, the PEs can be of different type, and with 
several date inputs and data outputs. The register files can be of different type as well as, lilce f.ex. 
stacks, FIFOs and register files with rotating properly where the data rotate in the register file, and they 
may have several read and write ports from and to which data can be read and written simultaneously. 
The crossbar can be fully or only partially connected. Furthermore, outputs of register files may be 
connected to PE inputs and/or to processor outputs. 

It is Interesting to see that the way in which the building blocks, consisting of crossk>ar, register files 
(Silos, delay elements) and PEs (microprocessors, ALUs, ALEs), are connected together in figures 2, 3 
and 4 appears to be identical and is based on the following rules : 1) take the data outputs of the PEs 
(ALUs, ALEs) and connect them to the crossbar Inputs 2) take the crossbar outputs and connect them 
to the inputs of the register file, delay elements and Silos 3) take the outputs of the register files, delay 
elements and Silos and connect them to the (data) inputs of the PEs. 

Before closing this section over the prior art in data path architectures of processing devices with a 
distributed register file, tifteir short comings and major points for improvement wlH be shortly discussed. 

Two major shortcomings of a 'conventional' data path architecture with a single register file are the VLSI 
design challenge of the single register file and the power consumption of the single register file. Today's 
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microprocessors (f.ex. Pentium, PowerPC) have single register files containing at least 128 80-blt 
floating point registers and having at least 4 read and 4 write ports. This leads to a btg silicon area of 
the register file which leads in Hs turn to an increase in read/Write/access cycle times due to long wire 
lines to be charged and discharged. In order to compensate for this effect, special design techniques 
have to be utilized in order to Iceep the readMrite/access cycle times down to an acceptable level. This 
however, together yn\h the big silicon area, goes to the detriment of power consumption and therefore 
big single register files with multiple read/write ports are not very power efficient. 

Data path architectures wiOi distributed register file tiy to overcome these shorteomlngs by using several 
and smaller register files with only a few read/Write ports. Ail these register files together are of about 
the same size as a big single register file. However in case of a data path architecture like in figure 4, 
the prize that is paid to overcome the problems finked to a single register fite consists in bigger code 
size. This Is due to the fact that data path architectures with distributed register files are typically VLIW 
processor architectures where a compiler is optimizing tiie program code statically In order to optimally 
exploit the multiple register. For a certein number of reasons however, the program code of VUW 
processors is typically twice as large as for 'conventional ' processors (processing devices) wth a single 
register file. 

Another major point for improvement of processing devices with a single register file as well as with a 
distributed register file concerns tiie impiementefion costs. It was already mentioned that for a certain 
numt)er of reasons single register files are always of big size and therefore have high impiementefion 
costs in term of silicon area. However the same is tme for distributed register files if they are used as in 
figure 4 because they require a large crossbar to make the connections between the multiple register 
files and the PEs. 

It is the goal of the present invention to overcome these shortcomings of e3dsting data pafli architectures 
witti a single register file as well as witti a distributed regteter file. 

4. Brief description of the drawings 

Figure 1 shows the date path architecture of a 'conventionar processor witii a single register file. 
Figure 2 shows the data path architecture of the polycyclic processor developed by ESL Inc. 
Figure 3 shows the data path architecture of a video signal processor developed by Philips Research. 
Figure 4 shows the date path architecture of a processor with a distributed register file according to the 
prior art 

Figure 5 shows the date path arohitecture of a processing device with a distributed register file 
based on the present invention. 

Figure 6 shows a specific example of the date path archftecture of a processing device vwth a distributed 
register file based on the present Invention. 
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Figure 7 shows two variants of a specific type of register file containing a shift register connected to a 
crossbar. One variant of this type of register file is shown at lower right, the other variant is shown at 
lower lefL 

Figure 8 shows a specific example of an array of processing devices built up according to the rules 
based on the present Invention. 

Figure 9 shows two processing devices of an array and visualizes the rules concerning a) processing 
device inputs which are connected to an array Input and b) processing device inputs which are 
connected to an output of a processing device of the array. 



5. Detailed description of the drawings 

The main aspects of the present invention are described by referring to the figures mentioned in this 
section. 

Data path architectures of processing devices wKh a distributed register file based on the present 
Invenfion differ significantly from the data path architectures of the prior art and are obtained by applying 
a set of building rules to a set of building blocks. The differences with the prior art will become clear 
when discussing these building rules. 

Considered is a processing device comprising one or more Inputs, one or more outputs and one or 
more processing elements, each processing element ha^ng one or more inputs and one or more 
outputs. In the fqllowing, unless mentioned explicitly, the terms 'data path architecture', 'crossbar*, 
'register file' and 'processing element' always refer to the considered processing device. 

The first type of data path architecture of a processing device based on the present invention contains : 

(a) as many register files as there are processing device inputs and processing element 
outputs, where processing element outputs correspond to all the outputs of all the 
processing elements of the considered processing device and where all the register files 
have each one input and one or more outputs 

(b) as many crossbars as there are processing device outputs and processing element Inputs, 
where processing element inputs correspond to all the inputs of all the processing elements 
of the considered processing device and where ail the crossbars have each one output and 
one or more inputs 

and has a register-transfer-level data path architecture which is built up according to the following rules : 

(c) each processing device input is connected to the input of a register file 

(d) any processing device input and any other processing device Input are not connected to the 
same input of a register file 

(e) each output of each processing element is connected to the input of a register file 
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(f) any output of any processing element and any other output of any processing element are 

not connected to the same input of a register file 
(B) the input of each register file is connected either to an output of a processing element or to 

a processing device input 
(h) the input of any register file and the input of any other register file are not connected to a 

same output of any processing element 
(0 the input of any register file and the input of any other register file are not connected to a 

same processing device Input 
Q) each output of each register file fs connected to an input of a crossbar 
(k) any output of any register file and any other output of any other register file are not 

connected to a same input of a crossbar 
0) each input of each crossbar is connected to an output of a register file 
(m) any Input of any crossbar and any input of any other crossbar are not connected to the 

same output of any register file 
(n) each processing device output is connected to the output of a crossbar 
(o) each input of each processing element is connected to the output of a crossbar 
(p) any processing device output and any other processing device output ate not connected to 

ttie same output of a crossbar 
(q) any processing element input and any other processing element input are not connected to 

the same output of a crossbar 
(r) the output of each crossbar is connected eitherto a processing device output orto an Input 

of a processing element 

(s) the output of any crosst)ar and the output of any other crossbar are neither connected to a 
same processing device output nor to a same input of a processing element 

Note that the mies as described in (cHs) do not imply that each output of each register file has 
necessarily to be connected to all the inputs of all the crossbars. It is left up to the designer to decide 
which connecflons between outpute of register files and inpute of crossbars he wante to implement 
Therefore, any crossbar has as many inpute as there are outpute of register files connected to that 
crossbar. 

Furthermore, it should be noted that normally ail inpute and all outpute of ail register files, crossbars and 
processing elemente as well as all processing device inpute and all processing device outpute have the 
same bus width, the bus width being equal to one or more bite; in other words, all connections as 
specified in (cHs) have the same bus width* the bus width being equal to one or more bUs. However it is 
also conceivable that the' bus width differs fi'om connection to connecfion and fifom input/output to 
input/output of building blocks. In case of PEs, the bus width of the PE inpute may well be different from 
the bus vyndth of the PE outpute, depending of the opemfions that are performed in the PEs. In case of a 
crossbar, the connections may be done according to some aile, f.ex. connecting only the most/least 
significant bite of an input whose bus width is v^kler than the one of a crossbar output to which the 
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connection is done or Kex. filling the mostteasl significant bits of a crossbar output, whose bus wMth Is 
wider than the one of a crosst>ar input to which the connection is done, with some specific values. In 
case of a register file, the data values appearing on one or more inputs of the register file may be 
written/read into/from register locaUons according to similar rules as for the connections to be done 
inside a crossbar, depending on the bus width of the register file inputs, of the register cells contained in 
the regteter file and of the register file outputs. 

A processing device with a data path architecture built up according to the mles mentioned above is 
shown In figure 5. Rgure 5 aims at visualizing the above rules, therefore the number of processing 
device inputs and outputs, the number of PEs as well as the number of PE inputs and PE outputs not 
further specified. In contrast, figure 6 shows a specific e)«imple of a processing device with such a data 
path architecture : it contains two PEs, two processing device inputs and two processing device outputs. 
Each PE has two inputs and two outputs. Register files have either one, two or three outputs. 
Furthermore the number of existing connections between outputs of register file and inpute of crossbars 
differ from register file to register file and from crossbar to crossbar, in other words not all connections 
that are allowed by the rules are efTecGvely realized. 

The second type of data path architecture of a processing device based on the present invention slightly 
differs from the first type in the way that this second type of data path architecture contains one or more 
register files of a same type, this type of register file being shown in figure 7 and denoted by ' SR + # 
There are basically two sllghtiy different variants of this type of register file, one varbnt shown at lower 
right in igure 7 and the other variant shown at lower left In figure 7. Both varbnts contain a shift register 
and a crossbar, the crossbar being denoted by '# Mn figure 7, and where 

(a) the shift register contains one or more register cells 

(b) the shift register has one input and as many outputs as there are register cells 
contained in the shift register, each register cell having one input and one output 

(c) the crossbar is e'rther partially or fully connected and has as many outputs as there are 
register file outputs 

(d) In case of one variant, tiie crossbar has as many inputs as there are register cett^ 
contained the shift register 

(e) in case of the other variant, the crossbar has as many inputs as the number obtained 
by incrementing by one the number of register cells contained in the shift register 

(f) in case of one variant, the register file Input is connected to the Input of the shift 
register 

(g) in case of the other variant, the register file input is connected to the Input of the shift 
register and to an Input of the crossbar 

(h) the input of the shift register is connected to the input of the first register cell of the shift 
register 

(i) the inputs and outputs of the register cells of the shift register are connected in such a 
way as to form a shift register 
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(i) the output of each register ceil of the shift register is connected to a shift register output 
(k) the output of any register cell of the shift register and the output of any other register 

cell of the shift register are not connected to a same shift register output 
(0 each shift register output is connected to an input of the crossbar 
(m) any shift register output and any other shift register output are not connected to a same 

input of the crossbar 

(n) in case of one variant, each input of the crossbar is connected to a shift register output 
(o) in case of ftie other variant, each Input of the crossbar is connected either to a shift 

register output or to the register file input 
(p) any input of the crossbar and any other input of the crossbar are neither connected to 

the same shift register output nor to the register file input 
(q) each output of the crossbar is connected to a register file output 
(r) any output of the crossbar and any other output of the crossbar are not connected to a 

same register file output 
(s) each register file output is connected to a crossbar output 

0) any register file output and any other register file output are not connected to a same 
crossbar output 

The difference between the two variants lies in the fact that in case of the variant shown at lower left in 
^ure 7, the register file input can directly be fonA^arded to one or more register file outpute without 
traversing a cell of the shift register. Furthermore, the shift register contelned in the register file may 
have a gated clock Input, In other words the contente of the register cells are only then shifted by one 
posRIon in the shift drecflon within every clock cycle of some clock used in the processing de^^ce if 
some signals generated in the control unit(s) of the processing device have a specific value. The value 
of these signals may change fi^om clock cycle to clock cycle of some clock used in the processing 
device and generally depend on the program code, on the instructions that are executed by the 
processing device, on resulte of operations performed by the PEs and on date values stored in the 
register files. As mentioned before, a shift register v\nth m ceils has a 'shift direcfion', in other words 
there e)dste a Increasing order of register cells labeled 1,2 ...m such that when the shift register is 
clocked the content of register cell with lal>el i Is shifted into register cell with label i+1 , for i=1 ,2,... .m-1 . 
The first cell of the shift register is the register cell with label 1 , the last cell of the shift register is the cell 
with label m. Note that concerning the bus wdth of any connections between any inputs and outputs of 
the shift register, of the crossbar, of any register cell of the shift register and of the register file iteelf the 
same remark holds as for the connections done inside a processing de^dce with a date path architecture 
of the first type as described above. 

The present invention is also dealing with arrays of processing devices. The data path architecture of 
the processing devices used in these arrays is closely related to the data patti architecture of the first 
and second type as descrit>ed above. 
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Considered is an an-ay comprising two or more processing devices and one or more array inputs and 
one or more array outputs. Each processing device of the considered array has one or more inputs and 
one or more outputs. In the following, unless mentioned explicitty, the temi 'processing device' always 
refers to the considered array. 

The array is built up according to the follo>Aing rules : 

(a) each array input Is connected to one or more inputs of one or more processing devices 

(b) any array input and any other array input are not connected to a same input of a 
processing device 

(c) each output of each processing device Is connected to one or more inputs of one or 
more processing dew:es or to one or more array outputs 

(d) any output of any processing device and any output of any other processing device are 
neither connected to a same Input of a processing device nor to a same array output 

». 

Here again, concerning the bus width of any connections between any inputs and outputs of the array 
Itself and/or of any processing devices of the array, the same remark holds as for the connections done 
inside a processing de^nce with a data path architecture of the first or second type as described above. 
FlnalJy. It should be noted that the mies as described in (aHd) allow for regular and irregular 
connections as it is exemplified by the aaay shown in figure 8. 

Furthermore, the rules as described in (aHd) do not imply that all possible connections, which are 
allowed by the rules, between inputs/outputs of processing devices and array Inputs/outputs are 
effectively realized. It Is left up to the designer to decide which connections he wants to Implement 

The first and second type of data path architecture as described above are used inside 'stand alone' 
processing devices, in other words processing devices which are not part of an array of several 
processing devices. The type of data path architecture of processing devices which are part of an array 
slightly differs finom the first and second type of data patt) architecture of a 'stand alone' processing 
device. The difference consists In the number of register tiles used inside each procesdng device of the 
anray as well as in the way that inputs of register files are connected to processing device inputs. In a 
few words, the difference is as follows : if an input of any processing device of the considered array is 
not connected to an output of a processing device of the considered array but is connected to an array 
input, then it Is connected to the Input of a register file in the same way as for the data path architecture 
of the first and second type described above; if an Input of any processing device of the considered 
array is connected to an output of a processing devioe of the considered array but is not connected to 
an an^y Input, then it directly connected to one or more inputs of one or more crossbars of the 
considered processing device. This mie is visualized in figure 9. Figure 9 shows thereby two processing 
devices of an array. As one can see, the input of the processing device at the right side, which is 
connected to an output of the processing device at the left side, is not connected to an Input of a 
register file but directly connected to one or more inputs of one or more crossbars of that processing 
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device. On the other hand, the input of the processing device at the right side, v\(hich Is connected to an 
anray Input is connected to an Input of a register file in the same way as tor the data path architecture of 
the fiist or second type described above. 

In the following, unless mentioned explicitly, the tenns 'processing device input* , 'processing device 
output, crossbar(s)', 'register file(s)' and 'processing element(s)' always refer to the considered 
processing device. 

In detail, this means that each processing device of the considered anray contains : 

(a) one or more processing device Inputs and one or more processing device outputs 

(b) one or more processing elements, each processing element having one or more inputs 
and one or more outputs 

(c) as many register files as the considered processing device has processing element 
outputs and marlced processing device inputs, where 

1 . processing element outputs correspond to all the outputs of all the processing 
elements of the conddered processing device 

2. marked processing device inputs correspond to all those inputs of the 
considered processing device which are connected to an array input 

3. each register file has one input and one or more outputs 

(d) as many crossbars as there are processing device outputs and processing element 
inputs, where processing element inputs correspond to the Inputs of all the processing 
elements of the considered processing device and where each crossbar has one output 
and one or more inputs 

and has a register-transfer-level data path architecture which is built up according to the following 
rules : 

(e) each processing device Input which is connected to an array input is connected to the 
input of a register file 

(1) each processing device input which Is connected to an output of a processing device of 
the considered array is connected to one or more inputs of one or more crossbars 

(g) any processing device Input and any other processing device input are neither 
connected to the same input of a register file nor to a same input of a crossbar . 

(h) each output of each processing element is connected to the input of a register file 

(i) any output of any processing element and any other output of any processing element 
are not connected to the same input of a register file 

(j) the input of each register file is connected either to an output of a processing element 

or to a processing device input 
(k) the Input of any register file and the input of any other register file are not connected to 

a same output of any processing element 
(I) the input of any register file and the input of any other register file are not connected to 

a same processing device input 
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(m) each output of each register fUe is connected to an input of a crossbar 

(n) any output of any register file and any other output of any other register file are not 

connected to a same input of a crossbar 
(o) each input of each crossbar is connected either to an output of a regteter file or to a 

processing device input 
(p) any input of any crossbar and any Input of any other crossbar are neither connected to 

the same output of any register file nor to a same proces^ng device input 
(q) each processing device output connected to the output of a crossbar 
(r) each input of each processing element is connected to the output of a crossbar 
(s) any processing device output and any other processing device output are not 

connected to the same output of a crossbar 
(t) any processing element input and any other processing element input are not 

connected to the same output of a crossbar 
(u) the output of each crossbar (s connected either to a processing device output or to an 

input of a processing element 
(v) the output of any crossbar and the output of any other crossbar are neither connected 

to a same processing device output nor to a same input of a processing element 

Here again, concerning the bus width of any connecfions between any inputs and outputs of building 
biocics of a processing device of the array, the same remaric holds as for the connections done inside a 
processing device with a data path architecture of the first or second type described above. 

Furthermore, conceming any processing device of the array, the mies as described in (eHv) above do 
not imply that each processing device input (which is connected to an array input) or each output of 
each register file has necessarily to be connected to all the inputs of all the crossbars, it is left up to the 
designer to decide which connections between outputs of register files and inputs of crossbars he 
wants to implement Therefore, any crossbar has as many inputs as there are outputs of register files 
and processing device inputs connected to that crossbar. 

It should be mentioned that cunrent semiconductor process technology allows to integrate arrays 
containing several processing devices onto a single chip. The application domain of 'stand alone* 
processing devices with a data path architecture based on the present invention is the same as the 
application domain of arrays of processing devices with data path architectures based on the present 
invention and consists of applicalidns within image/hnultimedte/signal processing, graphics processing 
and linear algebra. 

Before closing this section, it is important to mention that for a certain number of reasons conceming 
code density, compiler optimization, power consumption and computing power performance it is 
particularly interesting to let all the processing elements of a processing device with a data path 
architecture t>ased on the present invention be of the same type Qn other words to let all the processing 
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elements be of the same type), whether the considered processing device is a *stand aione' processing 
device or v\^hether the considered processing device is part of an array of processing devices based on 
the present invention. 

6. Summary of the invention 

The present invention concerns a processing device according to claim 1 and an array of processing 
deuces according to claim 8. 
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Claims 

What is claimed is : 

1. A processing device comprteing : 

(a) one or more processing de\^ce Inputs 

(b) one or more processing device outputs 

(c) one or more processing elements, each processing element having one or more inputs 
and one or more outputs 

(d) as many register files as there are processing device Inputs and processing element 
outputs, where processing element outputs correspond to all the outputs of all the 
processing elements of the considered processing device and where all the register 
fifes have each one input and one or more outputs 

(e) as many crossbars as there are processing device outputs and processing element 
Iriputs, where processing element inputs correspond to all the inputs of all the 
processing elements of the considered processing device and where all the crossbars 
have each one output and one or more inputs 

and having a register-transfer-level data path architecture which is built up according to the foilovifing 
rules: 

(0 each processing device input is connected to the input of a register file 

(g) any processing device input and any other processing device input are not connected to 
the same Input of a register file 

(h) each output of each processing element is connected to the Input of a register file 

(0 any output of any processing element and any other output of any processing element 

are not connected to the same input of a register file 
(j) the input of each register file is connected either to an output of a processing element 

or to a processing device input 
(k) the input of any register file and the input of any other register file are not connected to 

a same ou^ut of any processing element 

(i) the input of any register file and the input of any other register file are not connected to 
a same processing device Input 

(m) each output of each register file is connected to an input of a crossbar 

(n) any output of any register file and any other output of any other register file are not 

connected to a same input of a crossbar 
(o) each input of each crossbar is connected to an output of a register file 
(p) any input of any crossbar and any input of any other crossbar are not connected to flie 

same output of any register file 
(q) each processing device output is connected to the output of a crossbar 
(r) each input of each processing element is connected to the output of a crossbar 
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(s) any processing device output and any other processing device output are not 

connected to the same output of a crossbar 
(0 any processing etement input and any other processing element input are not 

connected to the same output of a crossbar 
(u) the output of each crossbar Is connected either to a processing device output or to an 

input of a processing element 
(v) the output of any crossbar and the output of any other crosst>ar are neither connected 

to a same processing device output nor to a same input of a processing element 



2. A processing device as claimed In claim 1, where one or more or all register files are of a same 
type, this type of register file comprising : 

(a) a shift register containing one or more register ceils, the shift register ha^ng one Input 
and as many outputs as there are register cells contained in the shift register, each 
regster cell having one input and one output 

(b) a crossbar which is either partially or fully connected and which has as many inputs as 
there are register cells contained in the shift register and which has as many outputs as 
there are register file outputs 

and where 

(c) the register file input is connected to the input ofthe shift register 

(d) the input ofthe shift register is connected to the input ofthe first register cell ofthe shift 
register 

(e) the inputs and outputs ofthe register cells ofthe shift register are connected in such a 
way as to form a shift register 

(0 the output of each register cell of the shift register is connected to a shift register output 

(g) the output of any register ceil of the shift register and the output of any other register 
cell of the shift register are not connected to a same shift register output 

(h) each shift register output b connected to an input ofthe crossbar 

(0 any shift register ou^ut and any other shift register output are not connected to a same 

input ofthe crosst)ar 
(j) each input of the crossbar is connected to a shift register output 
(k) any input ofthe crossbar and any other input of the crossbar are neither connected to 

the same shift register output nor to the register file input 
0) each output ofthe crossbar is connected to a register file output 
(m) any output ofthe crossbar and any other output ofthe crossbar are not connected to a 

same register file output 
(n) each register file output is connected to a crossbar output 

(o) any register file output and any other register file output are not connected to a same 
crossbar output 
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3. A processing device as claimed in claim 1, where one or more or all register files are of a same 
type, this type of register file comprising : 

(a) a shm register containing one or more register cells, the shift register having one input 
and as many outputs as there are register cells contained In the shift register, each 
register cell having one input and one output 

(b) a crossbar which is either partially or fully connected and which has as many inputs as 
the number obtained by Incrementing by one the number of register cells contained in 
the shift register and as many outputs as there are register file outputs 

and where 

(c) the register file input is connected to the input of the shift register and to an input of the 
crossbar 

(d) the input of the shift register is connected to the input ofthe first register cell of the shift 
register 

(e) the inpute and outputs ofthe register cells ofthe shift register are connected in such a 
way as to form a shift register 

(0 the output of each register cell ofthe shift register is connected to a shift register output 

(g) the output of any register cell of the shift register and the output of any other register 
cell ofthe shift register are not connected to a same shift register output 

(h) each shift register output is connected to an input ofthe crossbar 

(j) any shift register output and any other shift register output are not connected to a same 
input ofthe crossbar 

0 each input ofthe crossbar is connected either to a shift register output or to the register 
file input 

(]<) any input ofthe crossbar and any other input ofthe crossbar are neither connected to 

the same shift register output nor to the register file input 
(I) each output ofthe crossbar is connected to a register file output 
(m) any output of the crossbar and any other output of the crossbar are not connected to a 

same register file output 
(n) each register fife output is connected to a crossbar output 

(o) any register file output and any other register file output are not connected to a same 
crossbar output 

4. A processing device as claimed in claim 2, where the shift register of said type of register file contains 
at least 4 register cells 

5. A processing device as claimed in claim 3, where the shift register of said type of register file contains 
at least 4 register cells 

6. A processing device as claimed in claim 4, where said type of register file has at least two outpute 
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7. A processing device as claimed in claim 5, where said type of register file has at least two ou^uts 

8. An an^ comprising : 

(a) two or more processing devices, each processing device of the considered array having 
one or more inputs and one or more outputs 

(b) one or more array inputs and one or more array outputs. 

where in the following, unless mentioned explicitly, the term 'processing device(s)' always refer to the 
considered array and where the array is built up according to the following rules : 

(c) each array input is connected to one or more inputs of one or more processing devices 

^ (d) any array input and any other array input are not connected to a same input of a processing 
device 

(e) each output of each processing device is connected to one or more inputs of one or more 

processing devices or to one or more array outputs 
(0 any output of any processing device and any output of any other processing device are 
neither connected to a same input of a processing device nor to a same an^y output 
where in the following, unless menfioned explicitly, the tenns 'processing device inpuf , 'processing 
device output*, crossbar(s)', 'register file(s)' and 'processing element(s)' always refer to the 
considered processing device and where each processing device of the array contains : 

(g) one or more processing device inputs and one or more processing device outputs 

(h) one or more processing elements, each processing element having one or more inputs and 
one or more outputs 

(i) as many register tiles as the considered processing 6evtoe has processing element outputs 
and marlced processing device inputs, where 

i. processing element outputs correspond to all the outputs of all the 
processing elements of the considered processing device 

ii. marked processing de\dce inputs correspond to all those inputs of the 
considered processing device which are connected to an array input 

Hi. each register file has one Input and one or more outputs 
(j) as many crosst>ars as there are processing device outputs and processing element inputs, 
■ where processing element inputs correspond to the inputs of all the processing element of 
the considered processing device and where each crossbar has one output and one or 
more inputs 

and where each processing device of the array has a register-transfer-level data path architecture 
which is built up acbording to ^e follbwing rules : 

(k) each processing device inputwhich is connected to an anray input is connected to the input 
of a register file 

(0 each processing device inputwhich is connected to an output of a processing device of the 
considered array Is connected to one or more inputs of one or more crossbars 

(m) any processing device input and any other processing device input are neither connected to 
the same input of a register file nor to a same input of a crossbar 

16 



wo 01/52060 




PCT/EPOO/00259 



(n) each output of each processing element is connected to the input of a register file 

(o) any output of any processing element and any other output of any processing element are 

not connected to the same Input of a register file 
(p) the Input of each register lile is connected either to an output of a procesdng element or to 

a processing device input 
(q) the input of any register file and the input of any other register file are not connected to a 

same output of any processing element 
(r) the input of any register file and the input of any other register file are not connected to a 

same processing device input 
(s) each outputof each register file is connected to an input of a crossbar 
(t) any output of any register file and any other output of any other roister file are not - 

connected to a same input of a crossbar 
(u) each input of each crossbar is connected either to an output of a register file or to a . 

processing device input 
(v) any input of any crossbar and any input of any other crossbar are neither connected to the 

sanie output of any register file nor to a same processing device input 
(w) each processing device output b connected to the output of a crossbar 
(x) each Input of each processing element is connected to ttie output of a crossbar 
(y) any processing device output and any other processing device output are not connected to 

the same output of a crossbar 
(z) any processing element Input and any other processing element Input are not connected to 

the same output of a crosst)ar 
(aa)the output of each crossbar is connected either to a processing device output or to an input 

ofa processing element • 
(bb)the output of any crossbar and the output of any other crossbar are neither connected to a 

same processing device output nor to a same input of a processing element 

9. An array as claimed in claim 8, where one or more or all register files of all the processing devices 
of the array are of a same type, this type of register file comprising : 

(a) a shift register containing one or more register cells, the shift register having one input 
and as many outputs as there are register cells contained in the shift register, each 
register cell having one input and one output 

(b) a crossbar which Is either partially or fully connected and which has as many Inputs as 
there are register cells contained In the shift register and which has as many outputs as 
there are register file outputs 

and where 

(c) the register file input Is connected to ftie input of the shift register 

(d) the input ofthe shift register is connected to the Input of the first register cell of the shift 
register 
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(e) the inputs and outputs of the register cells of the shift register are connected in such a 
way as to form a shift register 

(f) the output of each register cell of the shtftregisteris connected to a shift register output 

(g) the output of any register cell of the shift register and the output of any other register 
cell of the shift register are not connected to a same shift register output 

(h) each shift register output Is connected to an Input of the crossbar 

(i) any shift register output and any other shift register output are not connected to a same 
output of the crosst>ar 

Q) each Input ofthecrosst)ar is connected to a shift register output 

(k) any input of the crossbar and any other Input of the crossbar are neither connected to 

the same shift register output nor to the register file input 
(1) each output of the crossbar is connected to a register file output 
(m) any output of the crossbar and any other output of the crossbar are not connected to a 

same register file output 
(n) each register file output is connected to a crossbar output 

(o) any register file output and any other register file output are not connected to a same 
crossbar output 

1 0. An array as claimed In claim 8, where one or more or all register files of all the processing de^4ces 
of the array are of a same type, this type of register file comprising : 

(a) a shift register conteining one or more register cells, the shift register haxnng one input 
and as many outpute as there are register cells conteined in the shift register, each 
register cell having one Input and one output 

(b) a crosst>ar which Is either partially or fully connected and which has as many inpute as 
the number obtained by Incrementing by one the number of register cells conteined in 
the shift register and as many outpute as there are register file outpute 

and where 

(c) the register file Input is connected to the input of the shift register and to an input of the 
crossbar 

(d) the input of the shift register Is connected to the Input of the first register cell of the shift 
register 

(e) the Inpute and outpute of the register cells of the shift register are connected in such a 
way as to form a shift register 

(0 the output of each register cell of the shift register is connected to a shift register output 

(g) the output of any register cell of the shift register and the output of any other register 
cell of the shift register are not connected to a same shift register output 

(h) each shift register output is connected to an input of the crossbar 

0) any shift register output and any other shift register output are not connected to a same 
input of the crossbar 
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(j) each input of the crossbar is connected either to a shift register output or to the register 
file input 

Qc) any input of the crossbar and any other input of the crossbar are neither connected to 

the same shift register output nor to the register file input 
(I) each output of the crossbar is connected to a register file output 
(m) any output of the crossbar and any other output of the crossbar are not connected to a 

same register file output 
(n) each register file output is connected to a crossbar output 

(o) any register file output and any other register file output are not connected to a same 
crossbar output 

1 1 . An array as claimed in claim 9, where the shift register of said type of register file contains at least 4 
register cells 

12. An anBy as claimed in claim 10, where the shift register of said type of register file conteins at least 
4 register cells 

1 3. An array as claimed in claim 1 1 , where said type of register file has at least two outputs 

1 4. An array as claimed in claim 12, where said type of register file has at least two outpute 
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